Incorporating an Error Corpus into a Spellchecker for Maltese

نویسندگان

  • Mike Rosner
  • Albert Gatt
  • Andrew Attard
  • Jan Joachimsen
چکیده

This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking: (i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c. 100M words extracted from diverse documents including the Maltese Legislation, press releases and extracts from Maltese web-pages and (ii) an extensive and detailed corpus of spelling errors that was collected whilst part of the MLRS texts were being prepared. We describe the structure of these resources as well as the experimental approaches focused on context that we are now in a position to adopt. We describe the framework within which a variety of different approaches to spellchecking and evaluation will be carried out, and briefly discuss the first baseline system we have implemented. We conclude the paper with a roadmap for future improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ordering the suggestions of a spellchecker without using context

Having located a misspelling, a spellchecker generally offers some suggestions for the intended word. Even without using context, a spellchecker can draw on various types of information in ordering its suggestions. A series of experiments is described, beginning with a basic corrector that implements a well known algorithm for reversing single simple errors, and making successive enhancements t...

متن کامل

MalToBI – Building an Annotated Corpus of Spoken Maltese

Research on the phonetics and phonology of Maltese, and in particular on different aspects of its prosody, is, thus far, rather limited. This is in part due to the lack of structured resources for use in research. One resource which, to date, has been unavailable, is a corpus of spoken Maltese. Such a corpus, could, amongst other things, be used as a ready resource for the analysis of various a...

متن کامل

Semi-automated typical error annotation for learner English essays: integrating frameworks

This paper proposes integration of three open source utilities: brat web annotation tool, Freeling suite of linguistic analyzers and Aspell spellchecker. We demonstrate how their combination can be used to preannotate texts in a learner corpus of English essays with potential errors and ease human annotators’ work. Spellchecker alerts and morphological analyzer tagging probabilities are used to...

متن کامل

Isolated-word Error Correction for Partially Phonemic Languages using Phonetic Cues

Partially phonemic languages use writing systems which are in between strictly phonemic and non-phonemic orthography. Therefore, phonetic errors are very frequent in such languages. This paper introduces an approach for development of spellcheckers for partially phonemic languages that use grapheme-to-phoneme mapping for isolated-word error correction. Since, a complete and accurate grapheme-to...

متن کامل

ICPhS Satellite Workshop on “Intonational Phonology of Understudied or Fieldwork Languages” The Intonational Phonology of Maltese

This paper reports on work which is currently being carried out to consolidate the phonological analysis of Maltese within the AutosegmentalMetrical framework used in other work on intonation and in previous work of the author. Consolidation of the phonological analysis of prosodic structure and intonation in Maltese is expected to go hand in hand with annotation of data from a small corpus of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012